University of Wisconsin Department of Biostatistics and Medical Informatics University of Wisconsin Department of Biostatistics and Medical Informatics

نویسندگان

  • John A Dawson
  • Christina Kendziorski
  • John A. Dawson
چکیده

Two challenging problems in the clinical study of cancer are the characterization of cancer subtypes and the classification of individual patients according to those subtypes. Statistical approaches addressing these problems are hampered by population heterogeneity and challenges inherent in data integration across high-dimensional, diverse covariates. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these concerns. LDA models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a " document " with " text " constructed from clinical and high-dimensional genomic measurements. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas (TCGA) ovarian project identifies informative patient subgroups that are characterized by different propensities for exhibiting abnormal mRNA expression and methylations, corresponding to differential rates of survival from primary therapy. 1. Introduction. Technological advances continue to increase both the ease and accuracy with which measurements of the genome and phenome can be obtained and, consequently, genomic-based studies of disease often involve highly diverse types of data collected on large groups of patients. The primary goals of such studies involve identifying genomic features useful for characterizing patient subgroups as well as predicting patient-specific disease course and/or likelihood of response to treatment. Doing so requires statistical methods that handle complex interactions, accommodate population heterogeneity, and allow for data integration across multiple sources. A number of statistical methods are available for survival-related feature identification and prediction (for a review, see Witten and Tibshirani (2009); Li and Li (2004); Wei and Li (2007)). Most often, classical models for a survival response are coupled with some dimension-reduction method, providing a concise representation of the genomic features affecting patient outcome. Although useful, the majority of these methods identify a set of covariates common to all patients and as a result may " distort what is observed " in the presence of population heterogeneity (Aalen, 1988). Survival-supervised clustering approaches naturally accommodate heterogeneity, providing for efficient and effective identification of patient subgroups (Dettling and Buhlmann, 2002; Li and Gui, 2004). However, these approaches do not identify salient features associated with subgroups; …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text S1: Supplementary Methods for “dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data”

1 Department of Statistics, University of Wisconsin, Madison, WI, U.S.A. 2 Department of Biomolecular Chemistry, University of Wisconsin, Madison, WI, U.S.A. 3 Department of Biochemistry, University of Wisconsin, Madison, WI, U.S.A. 4 Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, WI, U.S.A. 5 Department of Bacteriology, University of Wisconsin, Madison, WI, U.S.A. 6 D...

متن کامل

Examining the Relative Influence of Familial, Genetic and Environmental Covariate Information in Flexible Risk Models With Application to Ophthalmology Data

Héctor Corrada Bravo1 Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD Grace Wahba1 Department of Statistics, Department of Computer Science and Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI Kristine E. Lee2, Barbara E.K. Klein2, Ronald Klein2 Department of Ophthalmology and Visual Science, University of Wisc...

متن کامل

Wisconsin Medical Journal

Authors are in the MS program in Population Health at the University of Wisconsin-Madison. Additionally, Harding is an employee of EDS. Todem is a PhD student in the Departments of Biostatistics & Medical Informatics and Statistics. Please address correspondence to David Todem, MS, Department of Biostatistics and Medical Informatics, University of Wisconsin Medical School, WARF Building, 610 N ...

متن کامل

Supplementary Materials: A Statistical Framework for the Analysis of ChIP-Seq Data

for the Analysis of ChIP-Seq Data Pei Fen Kuan∗ Departments of Statistics and of Biostatistics and Medical Informatics Dongjun Chung Departments of Statistics and of Biostatistics and Medical Informatics Guangjin Pan† Genome Center of Wisconsin and Morgridge Institute for Research James A. Thomson Department of Anatomy, Genome Center of Wisconsin, Wisconsin National Primate Research Center and ...

متن کامل

Altering the Proclivity towards Daptomycin Resistance in Methicillin - Resistant 1 Staphylococcus aureus Using Combination with Other Antibiotics

3 Andrew D. Berti, Justine E. Wergin, Gary G. Girdaukas, Scott J. Hetzel, George Sakoulas, 4 Warren E. Rose 5 University of Wisconsin-Madison School of Pharmacy, Pharmacy Practice Division 6 University of Wisconsin-Madison School of Pharmacy, Analytical Instrumentation 7 Center 8 University of Wisconsin-Madison Department of Biostatistics and Medical Informatics 9 University of California San D...

متن کامل

The Importance of Being Negative: A serious treatment of non-trivial edges in brain functional connectome

Computer Engineering Program, University of Wisconsin-Stout, Menomonie, WI, USA, 54751 Department of Psychiatry, University of Illinois, Chicago, IL, USA, 60607 Department of Computer Science, University of Illinois, Chicago, IL, USA, 60607 Department of Bioengineering, University of Illinois, Chicago, IL, USA, 60607 Imaging Genetics Center, and Institute for Neuroimaging and Informatics, Keck ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011